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[57] ABSTRACT 

A method of processor architecture modification includes 
the step of defining an instrucdon set architecture for a 
current generation processor. Reserved within this definition 
are a set of instructions which perform no-<^>erations 
(NOPs), have no attached semantics, and do not change any 
architectural slate of the processor. Software code, compat- 
ible with the instruction set architecture, is dien written for 
running on the processor. In a next-generation processor, 
new functionality is added by defining a semantic for one of 
the NOPs so that jwograms which utilize die new instruction 
run on both the current-generation and the next-generation 
processors. 

13 Claims, 2 Drawing Sheets 
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METHOD OF MO DIFYIN G AN field for porfomilng rapid byte swapping operations on 

INSTRUCTION SET ARCHITECTURE OF A 32-bit quantities. With the BS WAP instniction a user could 

COMPUTER PROCESSOR TO MAINTAIN quickly convert little/big endian daU in a register to the 

BACKWARD COMPATIBILITY opposite (i.e.. big/little) endian form. 

FiFi n OFTHP mvFWTTnM ^ Although the BSWAP ins(ruction was heralded as an 

FIELD OFTHE INVENTION important imiovation, much of the software written for the 

The present invention relates generally to the field of i4S6™ processor did not use this instruction for (he single 

computers and microprocessors. Mcve particularly, the reason that it was not supported by the 1386^ processor. Id 

invention relates &e design and modification of an instruc- other words, progranmiers were reluctant to use the BSWAP 

tion set architecture of a processor. instruction because doing so would render their code incom- 

patible with the already large installed base of computer 

BACKGROUND OFTHE INVENTION software written for 1386™ processors. Only reccn^— 

The world has wimessed remaikaWc advances ova the ^ introduction of the i486™ processor— 

past several decades in the field of computer design and software developers begun to take advantage of flie 

architecture. One of the most significant innovations during BSWAP instruction. 

that time was the development of the micrcqvocesscr, which Tho-efore, what is needed is a solution to this problem so 

is essentially a con4>uter self-contained on a silicon chip. that once a new processor is introduced, software can 

Microprocessors have made it possible to execute com- immediately be written for that processor that takes advan- 

puter programs and other software routines for a myriad of instructions while remaining con^aiible with 

industrial ^iicatioas. To do this» a processor executes ^ written for previous or current generation processors. 

T'.r'!^^^^^^^^^''^^^'^'''^^'^'^'''' SUMMARY OF THE INVENTION 
set architecture (ISA). The ISA of a particular processor 

defines the set of availaUe instructions that may be incor- In accordance with the present invention, a method of 

porated into a program for execution on the processor. When ^ processor ardiitecture modificadco is provided for increased 

writing a program to be run on a certain processor, pro- performance based on greater software coii^>atibility and 

grammers must first familiarize themselves with the set of extensions. In one embodiment, ^e method includes the step 

instiuctions defined in the ISA, and then construct an of first defining an instruction set architecture for a current 

ordered sequence of particular instructions to carry out tasks generation jffoccssor. Witfiin this definition are reserved a set 

and operations wbich accomplish the desired goal By way ^ of *1untablc'* instructions which perform no-operations 

of example, following the introduction of the Intel® 8086 (NOPs). Each of the "hintable** NOPs has a unique opcode, 

microprocessor, a large number of coniqMiter programs were From an architectural stan<^Hant, the '"hintable** NOPs have 

written specifically for running on the 8086 processor, that attached semantics and do not alter an architectural state 

is, in acccrdance with the ISA defined by that machine. of the processor. This is true as long as the instruction set 

Ever since the introduction of the first microprocessor, 35 a^tecture is in use on that process^, 
there has been a problem with tiie rapid obsolescence of Over toe course of time, software developers and pro- 
software whidi supports an existing generation of machine. grammers will inevitably write code fa- execution on the 
Technological advances tend to occur quickly, so that in a processor. Of course, toese js'ograms must he compatible 
very short time an existing or current generation processor ^ instruction set architecture running on the proces- 
is surpassed by a new, in^iroved version of the same 40 son After a period of time a large base of software programs 
processor, i.e., the next generation process^. As a result, *>e established. 

software developers continually face the difSculty of writing When a next generation processor is developed, innova- 

oode that takes advantage of new features and performance tive new features are implemented by adding new instruc- 

capabHides of the processor hardware. tions to toe ISA. According to the invention, an associated 

The problem is that for each new generation of 45 semantic is defined i<x at least one of the *tiintable** NOPs 
microprocessor, new instmdions are added to the instruction 'o implement the new instruction in toe next generation 
set architecture toat implement new operations. Software processor. In this manner, software code may use toe hint- 
developers, however, commonly refrain from employing ^^^^ NOP instruction to run programs on toe next generation 
toese new instructions for many years. The reason is because processor. In otoer words, toe hintable NOPs are con^tiblc 
using a new instruction generally makes toeir code incom- 50 earlier versions of toe processor. The same program is 
patible wito previous or current generation processors. At ^^o con^tible wito (i.e., can run on) toe current generation 
toe time a new processor is introduced, toe previous or processor. Thus, newly generated code using toe new 
current generation processors typically have widespread instruction (or instmctions) is able to run on boto toe new 
acceptance and usage in toe marke^lace. Therefore, it ^nd old processor geaerations. 
makes good business sense to write programs that remain 33 ™Tpp nP^snJumoNr op trr hp awim^q 
con^atible wito present machines, at least until a large DESCRffTION OFTHE DRAWINGS 
number of users migrate to the newer processor. The novel features which are characteristic of toe present 
Unfortunately, this means that many advanced features and invention are set fourth in toe ^>pended claims. The inven- 
capabilities go largely unused for quite some time. This, of tion itself, however, as well as otoer features and advantages, 
course, impedes furtoer technological advancement. 60 understood wito reference to toe detailed description 

To illustrate how this problem has slowed progress in toe which follows, read in conjunction wito toe accompanying 

computer arts, one may consider toe Intel i486™ processor, drawings wherdn: 

whidi was introduced in 1989. The Intel i486™ processor FIG. 1 illustrates improvements in processor architecture 

was an improvement of toe widely popular Intel 1386™ and software development over toe course of time. 

jH-ocessor. One of toe in^veraents was toe introduction of 65 FIG. 2 illustrates toe definition of hintable NOPs for an 

a new instruction known as toe BSWAP instmction. The instruction set architecture accorxling to one embodiment of 

BSWAP instmction satisfied a long-felt need in toe con:^uter toe present Invention. 
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FIG. 3 illustrates a next gencratioa processor implemen- features and instiucdons which expaod the basic instruction 

tation of new instructions by attaching semantics to NOPs set architecture. 

added to the previous instruction set architecture. Following the introduction of a new microprocessar, a 

_ TwomnmrM body of Software programs is normally written so that, over 

DETAILED DESCRIFnON 5 time, a library of ap|dicatiott programs is estaMished for a 

As improvements aie made in the development of future particular instruction set architecture. Development of new 

generation micrcpiooessors, software con^tibility issues software is usually spuned by c<Hisumer demand following 

naturally arise. Compatibility oceans that, within certain the build-up of a su£Sciently large processor & platfonn 

limited constraints, programs that execute on any previous base. This is shown in the example of FIG. 1 by the aided 

generations of ccn^>atible nuCToprocessors will produce notations SWj* SWj, SW3 and SW4. The body of software, 

identical results when executed on a next generation pro- SW^ is written for the corresponding instruction set archi- 

oessor. There are, however, slightly diflftrent implcmenta- lecture; namely, ISAj. Likewise, software SW,, SW, and 

tions of aichitectuial features. More inqxHtantly, newer SW4are written totake advantage of the instructions defined 

processors typically add functionality tiirough innovative in architectures ISAs. ISA} and ISA4, respectively. Note that 

instructions and odier features to the instruction set ardu- the development of the sc^arc base for a particular instnic- 

tecture which did not exist in previous genoations. As a tion se< architecture usually occurs well after the iotroduc- 

result, software developers intent upon writing programs tion of the processor (c.g„ SW^ follows Pj, etc.). 

which utilize these new instructions must accept, as a As technology advances, an improved version of the 

consequence, that such programs would be incoiiq>atible original processor is inevitably developed. In the exaiiq>le 

with prior generation processors. To avoid this problem, ^ of FIG. I this inqirovod version is shown as processor Pj. 

many software vendors have refrained frwn including new Processor Pj provides extensions to the instruction set 

instructions and exploiting architectural features in their architecture ctf its predecessor. These extensions typically 

code until existing microprocessor families have obsolesced. include several new instructions, I^ Typically, a newer 

An example of how instiuction set architectures change generation processor attempts to ntaintain *i>ackward- 

with inqvovements in microprooesscv design is found in the ^ compatibility** with previous processes: generations so that 

evolution of tiie Intel processor architecture. In 1979, Intel tiie existing base of software programs can run on the newer 

introduced the 8086 and the 8088 CPUs which in^lcmcnted generation processor. This is shown in FIG. 1 by the arrow 

a basic, rudimentary instniction set architecture. This was t)ctween software based SWj and the portion of the new 

followed in 1982 by the introductiM by the 80286 CPU. In instruction set architecture which is compatible witii the 

1985, the 1386™ processor provided a variety of new ^ earlier generation. Hence, the representation ISAj^B A j+I^ 

instructions; these included LSS, LPS, LGS instructions, bit it should be understood, however, that while the improved 

scan instructions, double-shift instructions, and a general- processor P2 remains software compatible with its prcde- 

ized mult^ly instmction. cesser generation, the software base SW| may not t>e able to 

Several years later, in 1989, the i486™ processor intro- take advantage of the features and extensions specific to the 

duced three new application instructions: BSWAP aewer processor. In this case, software progranks contained 

instiuction, for converting from littie/bit endian to big/littie within SW| cannot execute instructions I^^ without modifi- 

endian formats; XADD instruction, which loads the desti- cation. Hence, at some point programmers need to develc^ 

nation register dest into the source register src and then loads new code sequences in order to take advantage of the full 

the sum of dest and the original value of src into dest; and ^ functionality of the new instruction set architecture. 

CMPXCHG compare exchange instruction, which compares Unfortunately, in tiie past this has meant waiting fOT long 

the accumulator with a dest, if tiiey arc equal src is loaded periods of time— on die order of years in some cases — for 

into dest, otherwise dest is loaded into the accumulator. The new software programs to be written that use tiie new 

i486™ processor also introduced three new system instruc- processor's new instructions. FIG. 1 illustrates this problem 

tions (INVD, WBINVD, INVLPG) for managing the cache by way of example, wherein a body of new software 

and translation look-aside buffer (TLB). programs SWj is not produced until well after tiic introduc- 

Most recentiy, the Pentium^ processor was introduced in tion of the instruction set architecture ISA2 for processor Pj. 

1993. The Pentium^^ processor offered a variety of new FIG. 1 also illustrates the establishment of software 

instructions for 32-bit machines. programs SW^ con^>atible with the instruction set architec- 

FIG. 1 provides an exanqile of how processor architecture 50 turc written for imKCSSor P3 (i.e., ISA3), and some earlier 

and software code develop evcdve over time. In FIG. 1, tiie versions. But as can be seen, programs SW3 do not take 

notations P|, Pj, rtc, refer to successive generations of advantage of the new instructions I, defined in processor P4. 

microprocessors. Each new generation builds on tiie previ- Because new application programs tend not to use new 

ous generation by adding new instructions and features to instructions following the introduction of a new processor, 

the instruction set architecture incoapcrated in tiie prior 35 there has been a disincentive to add instructions in a new 

generation machine. Thus, die notation BA^ refers to the instruction set architecture. The reason is because new 

instiuction set architecture developed for, and oon^atible software code that could utilize tiic newly d^ed instruc- 

with, processOT V^. tions would not execute on previous or current genaation 

Processor Pj has an associated instniction set architecture processors. Hence, progress is stymied. 

ISA2 consisting of the previous generation architecture 60 In accordance with the present invention, this problem is 

(ISA,) plus sMne new functionality, which may be added by overcome by a method which includes adding a set or a 

defining new instructions. These additional instructions for family of instnictions called **hintable** NOPs to a proces- 

processor P, arc rcjacsented by tiie notation I, in FIG. 1. sor's ISA. (Hintable NOPs are distinguished from ordinary 

Similarly, new instructions incorporated in future genera- hint instructions which utilize bits to supply a clue to 

tions processors P3 and P4 are shown by instructions I, and 65 hardware as to how to treat a particular block of data or 

Ij, respectively. As can be seen, the development of address.)||^-WntoWe^^ 

improved processors generally means the addition of new ' ' ' '-^'^ .>-..^- . 
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^%^^^^I^^^^^.*^MS^S^^^^ present inventian, each of the hintable NOPs has a unique 

pv..^- opcode. For example, Insbractioos L through L.^ may have 

I viewpoint, execution of a hifiablfNOP instruction respective opcodes 0X0F18 throu^ OXOFIF, in hcxaded- 

does absolutely nothing. It does not change the register files mal notation. 

or state of the processor. 3 After some time a new future generation processor wiU be 

Furthermore, according to the invented method, no developed. FIG. 3 shows new processor having a coire- 

semantic is attached to the hintable NOP instructions in the spending histruction set architecture ISA^. As before, ISA, 

current generation [H-ooessOT. In other words, the hintable consists erf additional instructions 1^ added to the previous 

NOPs simply occupy "space** in die ISA of the processor for instruction set architecture (ISA«). At this point, however, 

which it is defined. From an architectural stand^wint, it is lo actual micioardiitectural semantics of one or more of the 

sufficient to ensure that the hintable NOP instructions remain previously resented instructions L through L., are defined 

••no-operations- for the life of the corresponding instruction and implemented. This is shown in FIG. 3 whwein L=NEW, 

set architecture. . , . J^r^fEW^ ^ 

Typically, once a processor has been introduced, devcl- k is ^)|»cdated that the function performed by the new 

opment work begins on the next generation machine. Of instructions in the next generation proccsscr P- need not be 

course, a considerable time passes before the newer, known at the time of the development of processor P«. In 

un?5oyed accessor becomes commercially availaWe. Dur- otha words, the various operations and functions iim>le^ 

ing feis tune, however, the number of software programs mcntcd by instructions NEW, through NEW. may conmrise 

wntten for the current generation processor greatly expands. innovative functions that would increase peifcwmance <rf the 

NatwaUy, ttiese software programs are written to be com- » process^. For example, one can imagine new instructions 

patiMe with die particular instruction set architecture defined that perform certain data prefetching f^mctions whereby data 

for the current processor. is prefetched from a secondary level cache (L2) and stcaed 

According to the invention, the next step is to attach an in a primary or first level data cache (LI). Another possi- 

actuai microarchitecture semantic *1iinr to one or more of bility is to inclement an instruction that provides brandi 

the NOP instructions in Ac next generation processor. Hie ^ prediction hints to allow a CPU to achieve higher peif cr- 

attached semantic is added to the instruction so that it can be mance. 

used to boost performance, decrease power, or perform other Yet anoUicr aspect of the present invention involves die 

new and useful operaUons. The exact semantic need not be possibility of adding a second new set of reserved NOP 

defined at the time the hintable NOP instruction is created in instructions to die instruction set architecture defined for 

the current generation process^; radier, it is defined and ^ processcr P^. FIG. 3 iUustrates diis as additional instructions 

implemented in the future generation of die processor f am- I through I^,„. In accordance with the previous 

discussion, these instructions are reserved as currently 

Once the semantic has been defined and iiiq)lemented for unused instructions, but which may be used in future pro- 

a particular instruction, new software programs can imme- cesser generations to implement some useful function. Prac- 

diately talce advantage of it This means that newly gener- titioners in the art will dicreforc appreciate that the method 

ated software code may use diese instructions to run on bod! of die present invention allows software developers and 

new and older processor generations diat siq>port this new programmers to immediately take advantage of new exten- 

instruction set architecture. sions and functionality following die introduction <rf a new 

To rec^itulate, the basic concept of the invention is lo 40 processor, 

first reserve a set of cunrcnUy unused no-opcration instnjc- Wdi continuing reference to the examples of FIGS. 2 and 

tions. These instructions aeate a "space" in the architecture 3, following die introduction of processor P7, a prograraraer 

for future, yet-to-be-scen improvements. Semantics can then raay write code which uses instniction NEWi witiiout 

be defined in a future generation processor for one ot more suffering compatibility problems witii old processor genera- 

of die hintable NOPs previously aUocated in the archltec- 45 tions. Stated anodicr way, code written lot processor P7 

ture. The semantics for the NOPs implement a particular which uses instruction NEW^ would run on processor P^; 

new instruction or operation. Using tiiis mediod ensures therefore malting use of die new instruction strongly attrac- 

continuous compatibility across several generations of die tive to diose who write code. 

same processor famUy and eliminates barriers to developing While die mediod of die present invention has been 

new applications. ^ described in connection with certain embodiments and 

The invention is illustrated in the conceptual notations of exan^les, it should be understood that die broad concept 

FIGS. 2 and 3. FIG. 2 shows the creation of a new processOT embodied in the invention is not limited to tiiese examples, 

Ptf having an instruction set ardiitecture ISAg diat is an For instance, whereas in one embodiment the new instruc- 

in^ovcment over the previous or current generation pro- tions may assume a standard MOD/RM format of the Intel 

cessor. In FIG. 2, die current generation processor is repre- 55 architecture, other embodiments of the invention may 

seated by ISA,, wherein die in^ovcment comprises addi- assume different formats conapatible with different families 

tional instructions IS*. The set of instructions ISj^ includes of (oocessors. Moreover, die invention is not considered to 

reserved instructions L dirough I^^. (Note tiiat in diis be limited stricdy to complex instruction set or CISC 

example the number of instructions reserved is arbitrarily processors. The invention is equally applicable to reduced 

chosen to be eight This, however, should not be construed 60 instruction set or RISC processors. Therefore, tiie invention 

as limiting die number of hintable NOPs diat may be defined is appUcable to aU types of data processors, 

at any time. In other words, the number of himable NOPs I claim: 

added to an instruction set ardiitecture may be any number.) 1. A mediod of processor architecture modification com- 

As shown, these reserved instmctions (I^ dirough 1^7) arc prising die steps of: 

no-operation instructions (NOPs). They have no semantics 63 (a) defining an instruction set architecture (ISA) for a 

and execution of any one of diem docs not alter an archi- current generation processor which includes a pluraUty 

tectural state of die i^ocessor. In one embodiment of die of no-opcration instructions (NOPs) having no assod- 
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ated semantics, and wherein execution of any one of Ihe 
NOPs does not alter an architectural state of the poxv 
cesscr; 

(b) waiting until software programs have been written that 
are compatihle with the ISA and which run on the ^ 
cuireot generatioD processor; 

(c) defining, fox at least one of the NOPs, an associated 
semantic that in^ements a new instruction in a next 
generation processor, the next generation processor 
being an improvement of the current generation 
processOT, such that a new s<rftware program that uses 
the new instruction and which runs on die next gen- 
eration processor^ also runs on die current generation 
processor. , 

2. The method according to claim 1 wherein each of the 
N0F6 has a unique <^xx>de. 

3. The method according to claim 2 wherein the plurality 
of NOPs comjffise eight NOPs. 

4. The mctiKxl accwding to claim 3 wherein the unique 
opcodes of the eight NOPs are represented in hexadecimal ^ 
notation as 0X0F18 to OXOFIE 

5. An architecture for a processor which defines a set of 
instructions that can be inoxpcrated into a program for 
execution on the processor* the set of instructions including ^ 
a plurality of no-operation instructions (NOPs), each having 

a unique opcode and no associated semantics, and wherein 
execution of any odc of the NOPs does not alter ao archi- 
tectural state of the jHX)cessOT; 
the NOPs being available for use in an iniim)Ycd version ^ 
of the processor, wherein definition <rf an associated 
semantic for at least one of the NOPs in the improved 
version implements a new instruction that specifies a 
new operation. 



8 

6. The architecture of claim 5 wherein the new operation 
alters the architectural state of the improved version of die 
processor. 

7. The architecture of claim 5 wherein the plurality of 
NOPs conqsise eight NOPs. 

8. The architecture of claim 7 wherein ttie unique opcodes 
of the eight NOPs are represented in hexadeci m al notation 
as 0XQF18 to OXOFIF. 

9. The architecture of claim 5 wherein the new instruction 
can be incoq>orated into newly-generated code that is able to 
run on both the {Hxx^esscM: and the inq>roved version of die 
processor. 

10. Apsooessor having an architecture which defines a set 
of instructions that can be incorpmted into a program for 
execution thoeon, the set of instructions inclu<ting a plural- 
ity cf no-operation instructions (NOPs)» each having a 
unique opcode and no associated semantics, and wherein 
execution of any one of the NOPs does not alter an archi- 
tectural state of the processor, 

the processor being operable to execute code which 
inchides a new instruction, the code also being execu^ 
able on a next-generation processor having a new 
architecture which includes the set oi instrvctions, but 
with an associated semantic tiiat implements the new 
instruction for at least one of the NOPs. 

11. The processor of claim 10 wherein execution of the 
new instruction alters the architectural state of the next- 
generation processor. 

IZ The processor of claim 10 wherein the plurality of 
NOPs con^e ci^t NOPs. 

13. The processor of claim 12 wherein die unique opcodes 
of die eight NOPs are represented in hexadedmal notation 
as 0X0F18 to OXOFIF. 

***** 
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